SQL Server 2008 R2 : Replication - The Publisher, Distributor, and Subscriber Magazine Metaphor

8/8/2011 5:55:37 PM

Any SQL Server can play up to three distinct roles in a data replication environment:

Publication server— The publication server (or publisher) contains the database or databases that will be published (the magazine!). This is the source of the data that is to be replicated to other servers. In Figure 1, the Customer table (an article in the magazine) in the AdventureWorks2008 database is the data to be published. To publish data, the database that contains the data that will be published must first be enabled for publishing.

Figure 1. The publisher, distributor, and one or more subscribers.
Distribution server— The distribution server (or distributor) can either be on the same server as the publication server or on a different server (in which case it is a remote distribution server). This server contains the distribution database. This database, also called the store-and-forward database, holds all the data changes that are to be forwarded from the published database to any subscription servers that subscribe to the data. A single distribution server can support several publication servers. The distribution server is truly the workhorse of data replication; it is essentially the mail system that picks up the magazine and delivers it to the subscription holder.
Subscription server— The subscription server (or subscriber) contains a copy of the database or portions of the database being published (for example, the Customer table in the AdventureWorks2008 database). The distribution server sends any changes made to this table (in the published database) to the subscription server’s copy of the Customer table. This is known as store-and-forward. Some data replication configurations send the data to the subscription server, and then the data is read-only. It is also possible for subscribers (known as updating subscribers) to make updates, which are sent back to the publisher. More on this in the Updating Subscribers Replication Model section.

There are now new variations of this update subscriber option called peer-to-peer replication. Peer-to-peer allows for more than one publisher of the same data (table) at the same time! Essentially, each publisher is also a subscriber at the same time (hence, peer-to-peer).

Along with enabling distinct server roles (publisher, distributor, and subscriber), Microsoft utilizes a few more magazine metaphors, including publications and articles. A publication is a group of one or more articles and is the basic unit of data replication. An article is simply a pointer to a single table, or a subset of rows or columns out of a table, that will be made available for replication.

Publications and Articles

A single database can contain more than one publication. You can publish data from tables, from database objects, from the execution of stored procedures, and even from schema objects, such as referential integrity constraints, clustered indexes, nonclustered indexes, user triggers, extended properties, and collation. Regardless of what you plan to replicate, all articles in a publication are synchronized at the same time. Figure 2 shows an example of a publication (named Cust_Orders publication) with three articles (three tables from the AdventureWorks2008 database). You can also choose to replicate whole tables or just parts of tables via filtering.

Figure 2. The Cust_Orders publication (in the AdventureWorks2008 database).

Filtering Articles

You can create articles within a publication in several different ways. The basic way to create an article is to publish all the columns and rows contained in a table. Although this is the easiest way to create articles, your business needs might require that you publish only specific columns or certain rows of a table. This is referred to as filtering vertically or horizontally. When you filter vertically, you filter only specific columns, whereas with horizontal filtering, you filter only specific rows. In addition, SQL Server 2008 provides the added functionality of join filters and dynamic filters.

As Figure 3 shows, you might need to replicate only a customer’s CustomerID, TerritoryID, and CustomerType to various subscribing servers around your company. In your company, the other data, such as AccountNumber, may be restricted information that should not be replicated for general use. For that reason, you simply create an article for data replication that contains a subset of the Customer table that will be replicated to these other locations and excludes AccountNumber (and rowguid and ModifiedDate as well).

Figure 3. Vertical filtering creates a subset of columns from a table to be replicated to subscribers.

As another example, you might need to publish only the Customer table data for a specific customer type, such as “individual” customers ((CustomerType = 'I') or customers that are “stores” (CustomerType = 'S'). This process, as shown in Figure 4, is known as horizontal filtering.

Figure 4. Horizontal filtering creates a subset of rows from a table to be replicated to subscribers.

It is possible to combine horizontal and vertical filtering, as shown in Figure 5 . This way, you can weed out unneeded columns and rows that aren’t required for replication (that is, are not needed by the subscribers). For example, you might need only the customers that are stores and need only CustomerID, TerritoryID, and CustomerType data to be published.

Figure 5. Combining horizontal and vertical filtering allows you to pare down the information in an article to only the important information needed by the subscribers.

As mentioned earlier, it is now possible to use join filters. Join filters enable you to use the values of one article (that is, values from a table) to determine what gets replicated from another article (that is, what values can be associated with another table) via a join. In other words, if you are publishing the Customer table data based on the customers that are stores, you can extend filtering (that is, a join filter) to replicate only those orders for these types of customers (as shown in Figure 6). This way, you replicate only orders for customers that are stores to a subscriber that needs to see only this filtered data. This type of replication can be efficient if it is done well.

Figure 6. Horizontal and Join publication: Joining customers that are stores (type “S”) to corresponding SalesOrderHeader rows.

You also can publish stored procedure executions, along with their parameters, as articles. This can be either a standard procedure execution article or a serializable procedure execution article. The difference is that the latter is executed as a serializable transaction; the serializable option is recommended because it replicates the procedure execution only if the procedure is executed within the context of a serializable transaction. If that same stored procedure is executed from outside a serializable transaction, changes to data in published tables are replicated as a series of DML statements. In general, replicating stored procedure executions gives you a major reduction in the number of SQL statements being replicated across the network versus standard DML statements.

For instance, if you wanted to update the Customer table for every customer via an UPDATE SQL statement, the resulting Customer table updates would be replicated as a large multi-step transaction involving at least 5,000 separate UPDATE statements at a minimum. This number of statements would significantly bog down your network. However, with stored procedure execution articles, only the execution of the stored procedure is replicated to the subscription server, and the stored procedure—not the numerous update statements—is executed on that subscription server. Figure 7 illustrates the difference in execution described earlier. Some subtleties when utilizing this type of data replication processing can’t be overlooked, such as making sure the published stored procedure behaves the same on the subscribing server side.

Figure 7. Comparison of stored procedure execution and standard SQL statement replication.

Many more data replication terms are presented in this article, but it is essential that you first learn about the different types of replication scenarios that can be built and the reasons any of them would be desired over the others. It is also worth noting that Microsoft SQL Server 2008 supports replication to and from many different “heterogeneous” data sources. In other words, OLE DB and ODBC data sources can subscribe to SQL Server publications, and they can receive data replicated from a number of data sources, including Microsoft Exchange, Microsoft Access, Oracle, and DB2.

Other -----------------

- SQL Server 2008 R2 : Replication - What Is Replication?

- SQL Server 2008 High Availability : Other HA Techniques That Yield Great Results & High Availability from the Windows Server Family Side

- SQL Server 2008 High Availability : Building Solutions with One or More HA Options

- SQL Server 2008 High Availability : The Fundamentals of HA

- Administering SQL Server 2008 with PowerShell : Step-By-Step Examples (part 4)

- Administering SQL Server 2008 with PowerShell : Step-By-Step Examples (part 3) - Performing a Database Backup

- Administering SQL Server 2008 with PowerShell : Step-By-Step Examples (part 2) - Common OS-Related Tasks

- Administering SQL Server 2008 with PowerShell : Step-By-Step Examples (part 1) - General Tasks & Scheduling Scripts

- PowerShell in SQL Server 2008

- Protecting SQL Server Data : SCHEMA ARCHITECTURE STRATEGIES - Using Database Object Schemas

- Protecting SQL Server Data : SCHEMA ARCHITECTURE STRATEGIES - Protection via Normalization

- Troubleshooting and Optimizing SQL Server 2005 : Server Configuration Maintenance

- Troubleshooting and Optimizing SQL Server 2005 : Tuning the Database Structure

- Troubleshooting and Optimizing SQL Server 2005 : Data Analysis and Problem Diagnosis

- SQL Injection Attacks and Defense : Exploiting the Operating System - Consolidating Access

- SQL Injection Attacks and Defense : Executing Operating System Commands

- Administering SQL Server 2008 with PowerShell : PowerShell Scripting Basics (part 2)

- Administering SQL Server 2008 with PowerShell : PowerShell Scripting Basics (part 1)

- Administering SQL Server 2008 with PowerShell : Overview of PowerShell

- SQL Server 2008 Scheduling and Notification : Scripting Jobs and Alerts, Multiserver Job Management & Event Forwarding